161 research outputs found

    Convergence Rates of Gaussian ODE Filters

    Get PDF
    A recently-introduced class of probabilistic (uncertainty-aware) solvers for ordinary differential equations (ODEs) applies Gaussian (Kalman) filtering to initial value problems. These methods model the true solution xx and its first qq derivatives \emph{a priori} as a Gauss--Markov process X\boldsymbol{X}, which is then iteratively conditioned on information about x˙\dot{x}. This article establishes worst-case local convergence rates of order q+1q+1 for a wide range of versions of this Gaussian ODE filter, as well as global convergence rates of order qq in the case of q=1q=1 and an integrated Brownian motion prior, and analyses how inaccurate information on x˙\dot{x} coming from approximate evaluations of ff affects these rates. Moreover, we show that, in the globally convergent case, the posterior credible intervals are well calibrated in the sense that they globally contract at the same rate as the truncation error. We illustrate these theoretical results by numerical experiments which might indicate their generalizability to q∈{2,3,… }q \in \{2,3,\dots\}.Comment: 26 pages, 5 figure

    Uncertainty-Aware Numerical Solutions of ODEs by Bayesian Filtering

    Get PDF
    Numerical analysis is the branch of mathematics that studies algorithms that compute approximations of well-defined, but analytically-unknown mathematical quantities. Statistical inference, on the other hand, studies which judgments can be made on unknown parameters in a statistical model. By interpreting the unknown quantity of interest as a parameter and providing a statistical model that relates it to the available numerical information (the `data'), we can thus recast any problem of numerical approximation as statistical inference. In this way, the field of probabilistic numerics introduces new 'uncertainty-aware' numerical algorithms that capture all relevant sources of uncertainty (including all numerical approximation errors) by probability distributions. While such recasts have been a decades-long success story for global optimization and quadrature (under the names of Bayesian optimization and Bayesian quadrature), the equally important numerical task of solving ordinary differential equations (ODEs) has been, until recently, largely ignored. With this dissertation, we aim to further shed light on this area of previous ignorance in three ways: Firstly, we present a first rigorous Bayesian model for initial value problems (IVPs) as statistical inference, namely as a stochastic filtering problem, which unlocks the employment of all Bayesian filters (and smoothers) to IVPs. Secondly, we theoretically analyze the properties of these new ODE filters, with a special emphasis on the convergence rates of Gaussian (Kalman) ODE filters with integrated Brownian motion prior, and explore their potential for (active) uncertainty quantification. And, thirdly, we demonstrate how employing these ODE filters as a forward simulator engenders new ODE inverse problem solvers that outperform classical 'uncertainty-unaware' ('likelihood-free') approaches. This core content is presented in Chapter 2. It is preceded by a concise introduction in Chapter 1 which conveys the necessary concepts and locates our work in the research environment of probabilistic numerics. The final Chapter 3 concludes with an in-depth discussion of our results and their implications

    On the Theoretical Properties of Noise Correlation in Stochastic Optimization

    Full text link
    Studying the properties of stochastic noise to optimize complex non-convex functions has been an active area of research in the field of machine learning. Prior work has shown that the noise of stochastic gradient descent improves optimization by overcoming undesirable obstacles in the landscape. Moreover, injecting artificial Gaussian noise has become a popular idea to quickly escape saddle points. Indeed, in the absence of reliable gradient information, the noise is used to explore the landscape, but it is unclear what type of noise is optimal in terms of exploration ability. In order to narrow this gap in our knowledge, we study a general type of continuous-time non-Markovian process, based on fractional Brownian motion, that allows for the increments of the process to be correlated. This generalizes processes based on Brownian motion, such as the Ornstein-Uhlenbeck process. We demonstrate how to discretize such processes which gives rise to the new algorithm fPGD. This method is a generalization of the known algorithms PGD and Anti-PGD. We study the properties of fPGD both theoretically and empirically, demonstrating that it possesses exploration abilities that, in some cases, are favorable over PGD and Anti-PGD. These results open the field to novel ways to exploit noise for training machine learning models

    Validation of a Three-Item Short Form of the Modified Weight Bias Internalization Scale (WBIS-3) in the German Population

    Get PDF
    Introduction: Individuals suffering from overweight or obesity frequently experience weightbased stigmatization. The widespread belief that weight is a matter of personal will and selfcontrol results in various weight-based stereotypes (e.g., laziness, lack of self-discipline, or neglect). Objective: Based on the modified version of the Weight Bias Internalization Scale (WBIS-M), a short form for the economic assessment of weight bias internalization in the general population was compiled and validated. Methods: A three-item short form (WBIS-3) was derived based on data from a representative sample of the German population (n = 1,092). This new short form was validated in a second representative population sample (n = 2,513). Item characteristics and internal consistency were obtained. Measurement invariance was tested. Construct validity was established via the correlation with theoretically related constructs (depression, anxiety, eating behavior, discrimination, weight status). To establish scale validity, all analyses were performed for the whole sample as well as for the subsample of individuals with overweight. Age- and gender-specific population norms were provided. Results: The WBIS-3 exhibited excellent psychometric properties. Internal consistency was α = 0.92. Strong measurement invariance was confirmed regarding age, gender, discrimination, and weight status in both the whole sample as well as the overweight subsample. Conclusions: The WBIS-3 constitutes a valid and economical tool for the assessment of weight bias internalization in epidemiological contexts. Measurement invariance allows for an unbiased comparison of means, correlation coefficients, and path coefficients within structural equation modeling across group

    An SDE for Modeling SAM: Theory and Insights

    Full text link
    We study the SAM (Sharpness-Aware Minimization) optimizer which has recently attracted a lot of interest due to its increased performance over more classical variants of stochastic gradient descent. Our main contribution is the derivation of continuous-time models (in the form of SDEs) for SAM and two of its variants, both for the full-batch and mini-batch settings. We demonstrate that these SDEs are rigorous approximations of the real discrete-time algorithms (in a weak sense, scaling linearly with the learning rate). Using these models, we then offer an explanation of why SAM prefers flat minima over sharp ones~--~by showing that it minimizes an implicitly regularized loss with a Hessian-dependent noise structure. Finally, we prove that SAM is attracted to saddle points under some realistic conditions. Our theoretical results are supported by detailed experiments.Comment: Accepted at ICML 2023 (Poster
    • …
    corecore